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1.0  Introduction 

An  operating  program  accepts  as  in,ut  an  essay  cf  up  to 
3CC  vords  in  length,  und  yields  as  output  an  esscv-type  para¬ 
phrase  that  is  a  uoni'edundan:  summary  of  the  content  of  the 
source  tent. 

Although  no  transfornationa  are  used,  tlic  cor.ta„l  a.-\  -•’••1’. 
sentences  in  the  input  text  any  be  conbined  into  a  sir,-  ».  sentence 
in  the  output.  The  fomat  of  the  output  essay  nay  be  varied  by 
adjustment  of  pragma  parnneters.  In  addition,  the  system  occasion¬ 
ally  inserts  subject  or  object  pronouns  in  its  paraphrases  to  avoid 
repetitious  style. 

The  components  of  the  system  inrude  a  phraae  structure  and 
dependency  per»*r,  a  routine  for  establishing  dependency  links 
across  sentences,  a  program  for  generati-*  coherent  sentence  para-  ' 
yturaaes  randomly  with  respect  to  order  and  repetition  of  source 
tert  subject  na*'c.  ,  a  control  system  for  determining  the  logical 
sequence  of  the  paraphrase  sentences,  and  ?  routine  for  inserting 
pronouns. 

The  present  version  cf  the  program  requires  thin  i..aividva' 
word  class  assignments  be  port  o*-  tr  inferrrtion  ^  ppl  ...J  vice,  a 
source  text,  and  also,  that  the  g.  •crsa*i-a;.  tructur"  of  the 
ntencec  in  the  source  cor  'ore.  :v  t..c  limitations  of  the  parsing 


system. 
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2.0  Dependency-Phrase  Structure  Parainc  Systea 

The  parsing  system  used  in  the  automatic  essay  writing 
experiments  performed  a  phrase  structure  and  dependency  analysis 
simultaneously.  Before  describing  its  nperntior.  it  vill  he  useful 
to  explain  the  operation  of  t.  typical  phrase  structure  par  -lng 


system. 

Cocke  of  IBM,  Yorktoun,.  developed  a  program  for  the  recognition 
of  all  possible  tree  structures  for  a  given  sentence.  The  program 
requires  a  grammar  of  binary  formulas  for  reference.  While  Cocke 

never  wrote  about  the  program  himself,  others  have  described  its 

1  2 

operation  and  constructed  grammars  to  be  used  with  the  program.  ’ 

The  operation  of  the  system  may  be  illustrated  with  a  brief 
example.  Let  the  grammar  consist  of  the  rules  in  Table  1; 

I 

let  the  sentence  is  hr  n arsed  be: 

A  BCD' 


The  grammar  is  scanned  for  a  match  with  the  fir  at  pair  of 
entities  occurring  in  the  sentence.  !lule  1  of  Table  1, 

A  +  B  ■  P.  applies.  Accordingly  A  ted  B  "jy  be  linked  together  in 
a  tree  structure  era  tkiir  IV.dting  rrxic  labeled  P. 


(a) 


0  D 


But  the  next  pair  of  elements,  2  +  C,  is  also  in  Table  1.  This 
demands  the  analysis  'f  an  additional  tree  structure. 
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1.  *  *  ?  ■  P 

2.  B  +  C  •  Q 

3.  P  +  C  -  B 

4.  A  +  Q  -  S 

5.  S  +  D  -  T 

6.  H  +  D  -  U 

Table  1 

ivc  l.ules  Tnr  CocKe's  Parsing  System 
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(b) 


ft 

/\ 

A  B  C  1) 


These  tvo  trees  ai*  nov  exanined  again,  for  tree  (a),  the 
sequence  P  +  C  Is  foiu.i  <n  Table  1,  yielding 
la) 

A  \ 

A  B  C  D 


For  tree  (b),  the  pair  A  +  ft  is  found  in  Table  1,  but  not 
the  sequence  ft  +  D.  Tht  result  here  is: 


A  B  C  D 

Further  exanination  of  tree  (a)  reveals  that  R  +  D  is  an  entry  in 
Table  1. 


A  B  C  B 


In  tr^e  (b),  '  +  B  is  found  to  be  in  Table 


A  B  C  D 


y^elded  tvo  possible  'ret  struroures  for  the 


The  analysis  has 
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ser.tence,  A  B  C  D.  Depending  upon  the  griwuir,  analysis  of 
longer  sentences  eight  yield  hundreds  nr  even  thousands  of  alternate 
tree  structures. 

Alternatively,  ? one  ef  the  separate  tree  avroctursr  night  not 
lead  to  conr’eti-.n.  If  grarx-.ar  rule  6  of  Table  1,  R  +  C  *  U,  v;re 
deleted,  the  analysis  of  sentence  (a)  in  the  example  ecu Id  not  he 
completed.  Cocke’s  system  performs  all  analyses  in  pare”'1  end 
saves  only  those  which  can  be  completed. 

The  possibility  of  using  a  parsing  grsenar  as  a  generation 
grammar  is  described  in  section  3- 
2.1  Phrase  Structure  Parsing  with  Subscripted  Buies 

The  phrase  structure  parsing  system  devised  by  the  author  makes 
use  of  a  more  complex  type  of  grammatical  formula.  Although  the 
implemented  system  iae«  ri.  yield  more  than  one  of  the  possible 
tree  structures  for  a  given  sentence  [ailtlpie  anelysoa  are  possible 
with  program  modification)  it  does  contain  a  device  that  Is  an  alter¬ 
native  to  the  temporary  parallel  analyses  of  trees  that  cannot  be 
completed. 

The  grammar  cons’ r-r  oi  c.  set  of  subsi.-iptsi  phrase  stricture 
form. las  at  for  example,  in  Tabic  2.  !>rc  'll’  represents  a  -,"ur.  -r 
noun  phrase  class,  'V  a  verb  or  verb  p'.  *sse  class,  'Prep*  a  preposition 
class,  'Hod'  a  prepositional  phrase  ci.^sr.  'Ai,  :  an  adjective  class,  s-d 
'S'  a  sentence  class.  subscripts  deter**'  •  th<*  order  and  limita¬ 


tions  of  application  of  *heso  rules  vhcr.  generating  as  W;ll  as 


1, 


2. 

3- 

4. 

5. 

6. 


"0+*2 

U0  +  »2 

L  +  Modi 


-  H, 


*  *2  “  V2 
*Pq  +  N3  "  Modl 


Table  2 


Phrase  otrv-rture  Rules 
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parsing.  The  use  of  the  rules  in  parsing  nay  be  Illustrated  by 
ex  staple. 

Consider  the  sentence: 

'The  fierce  t igsra  1::  India  eat  seat.  • 

‘.st-altig  one  has  determined  the  Individual  parts  of  speech  for 
each  word: 


T° 

No 

1 

^0 

"0  • 

1 

V. 

i ' 

*». 

l  J 

fierce 

1 

tigers 

1 

In 

1 

India 

1 

eat 

seat 

The  parsing  method  requires  that  these  granmar  codes  be  examined  In 
pairs  to  see  if  they  occur  In  the  left  half  of  the  rules  of  Table  2. 

If  a  pair  of  grammar  codes  In  the  sentence  under  analysis  cat  cries  one 
of  tv*  mles  and  at  the  same  tine  the  subscripted  of  the  components 
of  the  Table  2  pair  are  greater  th«"  or  equal  to  those  of  the  corre- 
— nding  elements  in  the  pair  in  the  sentence,  the  latter  pair  nay 
be  connected  by  a  single  node  in  a  tree,  and  that  node  labeled  with 
the  code  in  the  right  half  of  the  rule  in  Table  2. 

Going  f  :  left  to  right  (one  night  start  from  either  direction), 
the  first  pal-  of  codes  to  be  ''.necked  lc  Art Q  +  Ac i_.  ">.1?  scoucnoe 

does  not  occur  in  the  left  half  of  y  rule. 

The  next  pair  cf  :~des  is  Ad.b.  r  SQ-  This  pair  matches  the 
left  half  of  rule  2  in  T.eble  2,  Ad,)  .  *■  T..  »  N_.  Here  the  subscrip'. s 

V  4*  C. 

in  the  rule  are  greater  than  or  equal  t  :  :  0  ir  nnmterparts  in  the 
sentence  under  analysis.  Part  of  a  tree  say  now  be  drawn. 


/ 
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/  \ 

AdJ  K,  Prcpn 


The  fierce  tir^ra  in 


Ko  vo 


i  I 

eat  teat 


The  next  pair  of  code#  to  be  searched  for  "  K0  +  ^0“ 
This  is  not  to  be  found  in  Table  2. 

The  following  pair,  Prepc  *  il0,  fits  rule  5.  Tsbl-  e., 
PrepQ  +  m  Mod, .  The  subscript  rules  are  not  violated,  and 
accordingly,  the  sentence  structure  now  appears  as: 


/"a 

\ 

/ 

T° 

? 

Ko 

1 

Prep 

I 

tu 

fle^e 

1 

tigers 

1 

in 

The  next  pair  of  codes,  a*.ao  appears  in  laM“  2, 

”  S, .  But  if  these  two  terns  are  united,  the  HQ  would  be 
a  Member  of  two  units.  This  is  not  permitted,  e.g.. 


/\  /\ 


fierce  timers  in  India  eat  Mat 

Uhcn  a  code  scans  to  be  ?  nerocr  of  tore  than  one  hitler 


unit,  the  unit  of  minimal  rank  is  the  ot.c  selected,  i.ank  is  detemined 


July  21,  1961* 


-10- 


SP-  16C  2/n-"  '00 


by  the  lowest  subscript  if  the  codes  are  Identical.  In  this  case, 
where  they  are  not  identical.  (sentence)  is  axways  higher  than  a 
Mod^  or  any  code  other  -.han  another  sentence  type.  Accordingly,  the 
union  of  Nq  ♦  is  no.  yei'oraed.  This  particular  c-.vice  is  an 
alterr  M 1  e  to  the  temporary  cocputatlon  of  an  alternate  tree  struc¬ 
ture  that  would  have  to  be  discarded  at  a  lnt»~  utege  of  analysis. 

The  next  unit,  VQ  +  JtQ,  finds  e  natch  in  rule  -  '■  V*e  2, 

V1  +  -  V  .  yielding: 


/\ 

A 

A 

T° 

i° 

PrepQ 

1 

"0 

1 

vo 

1 

Ho 

1 

The 

fierce  r 

in 

1 

India 

1 

eat 

1 

neat. 

One  coeplete  paas  haa  been  made  through  the  sentence.  Successive 
passes  are  made  ’~t<l  no  new  units  are  derived.  On  the  second  paas, 
the  pair  Art^  +  AdJQ,  which  has  already  been  rejected,  is  not  conaidered. 
However,  a  new  pair,  Art.,  +  N^,  is  row  found  in  rule  1  of  Table  2, 


i 
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The  tree 

now  appears  as: 

*3  N 

t 

/ 

/ 

/ 

? 

/V 

^0  No 

1  1 

Prep, 

I  ' 

1 

The 

i  * 

fierce  tigers 

1 

in 

Tod , 
/ 


\ 


j 

India 


V„ 

/  “\ 
vo  ; 

eat 


in*,  .it 


Continuing,  the  next  pair  accounted  for  by  Table  2  is  NQ  +  Mod^ 
uhich  is  within  the  dooai.i  of  rule  3,  ^  +  Modj^  »  1^.  Here  the  sub¬ 
scripts  of  the  gramnar  rule  are  greater  then  or  equal  to  those  in  the 
text  entities.  !iow  the  H0  associated  with  -tiger,’  is  already  linked 
to  an  AdjQ  unit  to  fora  an  !!.,  unit.  However,  the  result  of  rule  3  in 
Table  o  is  an  H  unit.  The  lower  subscript  takes  precedence;  accordingly 
the  H  unit  end  ti.u  unit  of  wMch  it  foroed  a  part  oust  be  discarded, 

w***  the  result: 


*1 


/ 

/ 


Mod, 


\ 


s* 


Art0 

"0 

p^o 

!° 

I- 

1 

Tr” 

1 

fierce 

i 

timers 

in 

India 

e  j.l 

\ 

"0 

! 

On  the  balance  of  this  scon  through  the  sert-ucc  .10  r~*  structures 

are  encountered.  A  subsequent  pass  sill  Jink  .MJ0  to  !T,  Pacing  an 

U  unit.  Eventually  thir.  N0  vnlt  nil  ««  considered  Tor  linkage  with  V, 
2  “*■ 

,  .•  nv  rule  <  of  Tabl.  2-  This  U..  age  is  rejected 

to  forn  a  sentence,  oy  ruue  3  u,  »“-■* 

for  reasons  pertaining  to  rules  of  precedence. 


July  21,  1S&U 


12- 


3P-l602^CXV'  /~0 


A  rub  sequent  pasi  links  Artg 
Table  2.  Thla  N3  la  linked  to  V 


with  this  kg  to  form  by  rule  1  of 
2  by  riie'  6  of  Table  2. 


The  fierce  j  tigers  In  Indie  eet  meet 

Aa  tne  next  p««a  no  changes,  the  analysis  la  complete. 

I 

Thla  particular  system,  as  already  i.  Jlcated,  maxes  .to  provision  for 
deriving  several  tree  structures  for  e  single  sentence  although  It 
avoids  the  problem  of  temporarily  carrying  additional  analyses  which 
are  late”  diaca.—’id. 

2.2  Dependency 

A  phrase  structure  or  immediate  ucrs—tuancy  analysis  if  a 
sentence  may  be  viewed  as  a  description  of  the  relations  among  units 
of  varied  complexity.  A  dependency  is  a  description  of 

relations  among  simple  up  ts,  e.g.,  words.  TV  script  Ions  of  the  formal 
properties  of  dependency  trees  and  thrlr  reiatloeship  to  Immediate 
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constituency  trees  can  be  found  in  the  work  of  David  Hayes, ^  and 
k 

Haim  Gaifnan.  For  the  purpose  of  tM  3  paper,  the  notion  of  dependency 
will  be  explained  in  term3  of  the  information  required  by  a  dependency 
parsing  program. 

The  particular  system  described  next  performs  a  phrase  structure 
end  dependency  analysis  simultaneously.  The  output  of  the  program 
is  a  dependency  tree  superimposed  upon  a  phrase  strut.  •  tree. 

Fundamentally,  dependency  may  be  defined  aa  the  relationship  of 
an  attribute  to  the  head  of  the  construction  In  vhlch  it  occurs.  In 
exocentrlc  constructions,  the  head  Is  specified  by  definition.  Table  3 
contains  a  set  of  grsanatlcal  rules  vhlch  are  sufficient  for  both 
phrase  structure  and  dependency  parsing.  A  symbol  preceded  by  an 
aste-i«k  Is  considered  to  be  the  head  of  that  construction.  Accord¬ 
ingly,  In  rule  x.  of  iuiiie  3,  Art-  *  -  H^,  the  ArtQ  unit  Is 

'••oendent  on  the  1*2  unit.  In  rule  6  of  Table  3»  V3  “  ®1'  the 

Vj  unit  Is  dependent  on  the  unit. 

The  method  of  performing  a  simultaneous  phrase  structure  and 
dependency  rr  .ysls  Is  similar  to  the  one  described  In  the  previous 
section.  The  additional  feature  is  the  emulative  computation  nf 
the  dependency  relations  defined  by  "he  rules  in.  *he  An 

example  vill  be  helpf.i'’  in  illustrating  this  point. 
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1. 

2. 
3- 
U. 
5- 
6. 


Art0  +  *t»2  -  »3 

Adj0  *  *«2  =  n2 

*t*l  +  Hod^  -  H, 

*V,  +  H  »  V„ 
12  2 

*Pr*p0  +  »3  -  Mo^ 


Table  3 

Dependency-Phranr-  'Structure  mjcr 
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Consider  the  sentence: 


'The  girl  ware  a  new  hat. ' 

First  the 

vords  in  the 

sentence  are  numbered 

sequentially,  and 

word  class 

aasigrzcr- vu 

ere  cade. 

T° 

"f 

?  T° 

Ad.ic 

1 

»0 

1 

The 

girl 

wore  a 

1 

new 

i 

r.at 

0 

1 

2  2 

4 

i 

The  sequential  numbering  of  the  vords  is  used  in  the  designation 
of  dependency  relations.  Looking  ahead,  the  dependency  tree  that  will 
be  derived  will  be  equivalent  to  the  following: 


wore 


whc -e  the  arrows  indicate  the  direction  '*  dependency.  Anothr.  oy 
of  indicating  the  sane  dependency  analysis  is  * nc  lis»  fashion — each 
word  being  associated  with  thr  n'rnbe  •  the  •>rrd  it  ic  dependent  on. 
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The 

girl 

wore 

a 

r.ev 

hat 

0 

1 

2 

i 

4 

5 

1 

- 

K 

5 

2 

Consider  the  computation 

:  of  this  analysis. 

The  first 

two  un*t3, 

ArtQ  ♦  Hc,  are  united  by  rule 

1  of  Table  3,  Art- 
’  0 

.  The 

results  will  be 

Indicated  in 

a  slightly  different  faenicn  than  in 

the  examples  of 

section  2  1. 

t»3(l)_ 

I 

_*n3(o) 

1 

*Art0 

*«! 

i 

**0 

1 

*Art0 

*•^0 

*K0 

1 

l'he 

i 

girl 

1 

wore 

a 

I 

nev 

1 

hat 

0 

1 

2 

l 

L 

5 

l 


All  of  the  lnfo^— 'at  Ion  concerning  the  constructions  Involving 
a  particular  word  will  appear  lr.  a  column  above  that  word.  Each  such 
wrd  and  the  inn.»mation  above  it  will  be  called  an  entry.  This 
par*  icular  node  »,f  de«criptlon  represents  the  parsing  a.  It  te>.vs 
pleu'r  in  the  actual  computer  program- 

The  fact  that  Art  +  form  a  u:.'t  1*  marv'ld  by  th*  occurrence 
or  •  B,  at  the  top  of  entries  0  cr.\  1.  Hie  asterisk  preceding  the 
Nj  at  the  top  of  entry  1  indicates  that  tl  is  entry  is  ■-  belated  with 
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the  head  of  the  construction.  The  asterisks  associated  with  the 
individual  word  tags  indicate  that  at  this  level  each  word  is  the 
heal  of  the  construct ion  containing  it.  This  last  feature  is  ne-esaary 
because  of  certain  design  factors  in  the  pr-w-aiB. 

The  number,  in  brackets  adjacent  to  the  units  indie.*' e  the 
respective  partners  in  the  construction.  Thu-  the  (?)  at  the  top  of 
entry  0  indicates  that  its  partner  Is  in  entry  1,  and  the  (0)  at  the 
top  of  entry  1,  the  converse.  The  absence  of  on  asterisk  at  the  top 
of  entry  0  indicates  that  the  ir-sber  in  brackets  at  the  top  of  this 
entry  also  refers  to  the  dependency  of  the  English  words  involved  in 
the  construction;  i.e.,  'The*  of  entry  0  is  dependent  on  'girl*  of  entry 
1.  This  notation  actually  makes  redundant  the  use  of  lines  to  Indicate 
tree  structure.  They  are  plotted  only  for  clarity.  Also  redundant 
is  ths  audit,  ioniu.  3 'ii—tion  of  dependency  in  list  fashion  at  the 
bottom  of  each  entry.  This  info.-ation  is  tabulated  only  for  clarity. 

The  next  pair  of  units  accepted  for  by  the  program  is  AdJQ  +  NQ. 
These,  according  to  rule  2  of  Table  3,  are  united  to  foim  an  II ^  unit. 

_ *H3(0)  S2(5) _ *N2(4) 


*Adj„ 

V/  V 

i  I 

The  girl  wore  a  r.cv  aat 

0  12  4  1 

1  5 

Here  ’new’  is  dependent  on  ’hat.' 
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On  the  next  pass  through  the  sentence,  the  of  entry  1,  'girl, • 
is  linked  to  the  VQ  of  entry  2,  'wore,'  to  form  an  Sx  unit.  It  is 
worth  noting  that  a  unit  not  prefaced  by  sn  asterisk  is  ignored  in 
the  rest  of  the  parsing. 

*^(2) _ SjU) 


»,(1) 

1 

_ *N3(0) 

I 

NJ5' 

j 

_*».,( 4) 

1 

*Art0 

I 

1 

*"o 

I 

•V 

1 

0 

1 

T 

*MJ0 

I 

**0 

1 

1 

The 

1 

girl 

1 

wore 

a 

1 

new 

1 

hat 

0 

1 

2 

2 

4 

I 

1  1  5 

The  new  depir.ier.ty  ei_rg ing  fron  this  grouping  is  that  of 
'wore'  upon  'girl.'  The  Art^  of  entry  3  plus  the  of  entry  5 
form  the  next  unit  caablned,  Sa  indicated  by  rule  1  of  Table  3* 
Note  that  the  of  entry  4  can  be  skipped  because  it  is  not  pre¬ 
ceded  by  an  sa*  risk.  Adjacent  asterisked  vuac.8  are  the  only 


candidates  for  union. 
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On  the  next  pass  through  the  sentence,  the  of  entry  1,  'girl, ' 
1»  linked  to  the  VQ  of  entry  2,  'wore,'  to  form  an  ^  unit.  It  Is 
worth  noting  that  a  unit  not  prefaced  by  an  asterisk  Is  Ignored  in 
the  rest  of  the  parsing. 


*\(2\ _ SjU) 

1  1 

1*3(1)  _ 
1 

1 

1 

H~(5)_ 

j 

_ *N?(4) 

1 

1 

*Art0 

1 

**0 

I 

% 

I 

T 

*“J0 

1 

1 

*«0 

| 

1 

The 

1 

girl 

1 

wore 

a 

1 

new 

I 

i 

hat 

0 

1 

2 

1 

4 

i 

11  5 

The  new  dependency  •urging  from  this  grouping  la  that  of 
'ware'  upon  'girl.'  The  Artg  of  entry  3  plus  the  Jt,  of  entry  5 
form  the  next  unit  combined,  as  Indicated  by  rule  1  of  Table  3. 
Bote  that  the  Ng  of  entry  4  can  be  skipped  because  It  la  not  pre¬ 
ceded  by  an  sat"  risk.  Adjacent  asterisked  unj.ta  are  the  only 
candidates  for  unlot. 


Y  . 

\ 


\ 


\  ‘ 

i 

i 

| 

i 

I 

i 

/ 

I 

i 

i 
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»3(1) 


*Artc 

I 

Thu 


♦s^Ca) _ ^(1) 


.*it3(o) 

I 

"0 


*V„ 


*3(5) 


•Art. 


-**3(3) 


»?(5). 

I 

*Ad.- 


girl  uore 


.**2(4) 


**0 

I 

hat 


0 

1 


2 

5 


Oj.  the  next  peas  through  the  sentence,  the  VQ  of  entry  2  Is 
linked  to  the  of  entry  5  to  fan,  according  to  rule  V  of  Taj  Is  3, 
a  V2  unit.  The  unit,  of  vhich  the  VQ  Is  already  a  port.  Is  deleted 
becav**  the  V.  grouping  takes  precedence.  The  result  is: 


*3(1) _ *»3(o) 

I  I 

•Art  «#„ 


Th« 


0 

I 

girl 

1 


*V2(5) 


»3(5) 


•Art. 


V2(2) 


.*■3(3) 


\c: _ .-ju 

1  f 

•Ad’  .  •*,, 

I  ! 

1 

nev  hat 


1 

2 
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The  next  pus  completes  the  analysis,  by  linking  the  of  entry  1 
vith  the  Vg  of  entry  2  by  rule  6  of  Table  3* 


~31 

•  1 

U) 

1 

*V2(5)  _ 

1 

-  — 

_ V2(2) 

1 

*3(5) . 

! 

_ 

*3(1) _ 

1 

_ *» 

1 

3(0) 

! 

*2(5). 

j 

_ *N2(M 

1 

T? 

1 

1 

0 

: 

•V 

1 

0 

< 

**po 

| 

*MJq 

1 

7° 

The 

1 

girl 

1 

vore 

1 

a  ' 

I 

nev 

1 

hat 

0 

1 

2 

1 

4 

1 

1 

1 

s 

5 

2 

Note  again  that  the  dependency  analysis  may  be  read  directly 
free  the  phrase  structure  tree;  the  bracketed  digit  associated  vith 
the  top  unasteri «ked  phrase  structure  label  t  each  cat*/  indicates 
the  dependency  of  the  vord.  in  tlif.t  entry. 

The  only  entry  having  no  uauterisk-l  fern  at  the  top  ir  1. 

This  implies  that  'girl'  is  the  head  of  the  sentence.  This  choice  of 
the  main  noun  subject  instead  of  the  Eu-i'i  verb  as  the  sentence  head 
is  of  significance  In  ge’tratlng  coherent  discourse-  Toe  reasons  for 
this  are  indicated  in  section  3-2. 
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3.0  Generation 

The  discussion  of  generation  la  ranee rned  with  the  production  of 
both  nonaenalcal  and  coherent  dl-r<jurse. 

3.1  Gramaticaiiy  Cox. cot  Nonsense 

The  Renovation  of  grnaewtlcn'O-y  correct  ncrser.se  may  be  accoa- 
pllalitl  with  the  same  type  of  phraae  structure  -lieu  es  It.  "’able  2,  3 
and  1*.  (The  aaterlaka  In  Table  3  are  not  pertinent  -ration. )  A 

computer  program  Implement at lng  a  phraae  atructure  generation  gramaar 
of  this  aort  has  been  built  by  Victor  Yrgve.  ^ 

The  rules  In  Table  1*  contain  aubscripts  which,  aa  In  the  parsing 
system,  control  their  order  of  application.  The  rule*  nay  be  slewed 
aa  rewrite  Instruct Iona,  except  that  the  direction  of  rewriting  la 
the  r-werae  of  that  In  the  parsing  system. 

Start in*  wlui  u»e  synod  for  r-ntence,  S^,  H.,  +  may  be  derived 
v-  rule  6  of  Table  1*. 


Note  that  a  tree  structure  on.  re  -enerated  In  tracing 
history  of  the  rewritings.  Lefinh-zi.  nodes  are  •_crc^=ien  first.  The 
N,  'mit  r.sy  be  replaced  by  the  lef-  h-tr  o"  »1»  1,  p  or  3.  If  tne 

J 

t ^script  of  fche  N  on  the  fc.Vf  or  these  roles  vere  greater  than 


h 

\ 


I 
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3,  they  would  not  be  explicable.  This  ia  the  reverse  of  the  condition 
for  applicability  that  pertained  in  tne  parsing  syaten.  Assume  rule  1 
of  Table  4  i3  selected,  yielding: 


A 


\ 


/  \ 

Art0  H2 


9 


A  node  with  a  zero  subscript  cannot  be  further  expended.  Ad  . 
that  remains  is  to  choose  an  article,  at  randan,  say  ’the.1  The  N^ 
unit  can  still  be  expanded.  Note  that  rule  1  is  no  longer  applicable 
because  the  subscript  of  the  right  hand  member  is  greater  than  ;  . 
Suppose  rwe  ">  Table  4  is  selected,  yielding: 


The 


Nov  an  adjective  be  chcscr  at  random,  cay,  ’red.’  The  only 
r •'•par.sions  of  N^  are  by  rile  3  or  4,  or  rule  7,  which  rakes  it  a 

terminal  node.  Note  that  rule  j  ia  rec  usive:  that  <*,  it  may  be 
used  to  revrlte  a  node  repeatedly  without  reducing  the  value  of  the 
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subscript.  Accordingly  an  adjective  string  of  indefinitely  great 
length  could  be  generated  if  rule  3  vere  chosen  repeatedly.  For  the 
sake  of  brevity,  next  let  rale  5  of  Table  1  le  selected.  A  nour.  may 
now  be  chosen  at  random.  Say,  ’car,'  yielding: 


The  ’■ad  car 


Let  the  V,  be  written  V,  »  N„  by  rule  V  of  Table  4  and  that 
3  * 

rewritten  as  by  rule  8  of  Table  k.  Let  the  verb  chosen  for 
this  terainal  nod"  be  'eats.' 
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o*. 

The  only  remaining  expandable  node  is  N^.  Assume  that  is 
selected  by  rule  7*  If  the  noun  chosen  .for  the  terminal  node  i3 
'fish'  the  final  result  is; 


s 


'  •  i  I  i 

The  red  »»  -..tis  fish 


With  no  restrictions  placed  upon  the  selection  of  vocabulary,  no 
control  over  the  semantic  coherence  of  the  terminal  sentence  is  possible. 
3.2  Coherent  T  .  course 

The  output  «f  a  phrase  structure  generation  greomnr  v  «  be 
limited  to  coherent  diecourse  under  c  tain  conditi^i*.  3’  the  v.'ucu- 
lary  used  la  limited  to  thnt  of  some  source  text,  end  If  it  is  required 
that  the  dependency  relations  in  tm-  s':' cut  tences  not  differ  frua 
those  present  In  the  sour c<!  text,  then  the  .-..  .put  sentences  will  ba 
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coherent  end  will  reflect  the  meaning  or  the  source  text.  For  the 
purpose  of  matching  relation*  between  source  text  and.  output  text, 
dependency  may  be  treated  aa  transitive,  except  across  prepositions 
other  than  'of  and  rr'-zyt  across  verba  other  than  forma  of  'to  be.' 

A  computer  program  which  produce*  coherent  sentence  paraphrase* 
by  monitoring  of  dependency  relation*  ha*  been  d-arribed  elwevhure.  ’ 1 
An  example  will  illustrate  its  operation.  Consider  t*">  **xt: 

'The  man  rides  a  bicycle.  The  man  is  tall.  A  bicycle  is  a  vehicle 
with  wheels. * 

Assume  each  word  has  a  unique  grncnatlcal  code  assigned  to  it. 


The 

I 

man 

1 

rides 

I 

a 

I 

bicycle 

I 

1 

Art 

1 

N 

1 

V 

1 

Art 

1 

M 

The 

I 

man 

I 

is 

I 

,.-u 

.  | 

1 

Art 

1 

N 

1 

V 

! 

AdJ 

A  bicycle 

I  I 

la 

I 

a 

I 

vehicle 

1 

w*h 

1 

1  1 

Art  N 

1 

V 

1 

Art 

1 

S 

1 

Prep 

A  dependency  analysis 

of  this 

text  can  be  *’ 

0  ■  .Jt  Zm s 

network  or  a  list 

structure. 

In  e : ; 

n-r  03*3,  for 

purposes  < 
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paraphrasing,  two-way  dependency  links  are  assumed  to  exist  between 
lijte  tokens  of  the  saoe  noun.  A  network  description  would  appear  as 
follows : 


The 


/ 


rides 


man 

■rne  •  is 


bicycle 

/ 


bicycle 


tell 


/ 


vehicle 


/ 


with 


wheels 


0 

1 

2 

1 

h 

I 

6 

i 

8 

The 

man 

rides 

a 

bicycle. 

The 

man 

is 

tall. 

1 

0 

1 

U 

2,10 

6 

1 

6 

? 

2 

10 

U 

12 

±2 

14 

£2 

A 

bicycle 

is 

e. 

vehicle 

with 

wheels 

10 

It  • 

10 

13 

11 

13 

14 

The  paraphrasing  prcrrm-.  described  would  ne?in  with  the  selection 
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Thl*  generation  program, in  contrast  with  the  ixthod  described 
above,  chooses  lexical,  items  as  soon  as  a  new  slot  appears;  for  example, 
the  main  subject  and  verb  of  the  sentence  are  selected  now,  while  they 
are  adjacent  in  the  sentence  tree.  Assume  that  ’bicy.1*'  is  selected 
as  the  noun  for  N^. 


bicycle 


It  is  now  necessary  to  find  a  vert  directly  or  transitively  dependent 
on  'bicycle.'  Inspection  of  either  the  network  or  list  representation 
of  the  text  dependency  analysis  shows  no  verb  dependent  on  'bicycle.' 
the  computer  det*-rm_-.cr  this  by  treating  the  dependency  analysis  as  a 
maze  in  which  it  seeks  a  path  bet-ren  earn.  v<u-h  token  and  the  word 
'bicycle.'  Accordingly,  the  computer  progrma  requires  that  another  noun 
be  selected  in  its  place;  in  this  case,  'man.' 


*3 

man 


v 

\ 


3 


The  progrma  keeps  track  of  which  tck»»  'rv'  is  select*!. 
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It  is  now  necessary  to  choose  a  verb  dependent  on  'man.'  Let 
'rides’  be  chosen. 


man  rides 

Now  the  N,  nay  be  expanded.  Suppose  rule  1  of  Table  •»  m  -  c-**n 


man 


Not  'hat  'nan*  is  associated  with  the  new  noun  pbrese  node,  M,. 

It  is  now  necessary  to  select  an  article  dependent  on  'man.' 
Assuae  'a'  is  selected.  While  a  path  'a'  to  '..nr.'  does  seem  to 
exist  in  the  dependency  anal/ sis,  it  crosr.ei  'rides,'  •-■hlch  is  a 
member  of  a  verb  class  treated  m,  an  irt.-eo3li.ive  link.  Ac— 'r'llnc.y, 
’a’  is  rejected.  Either  token  of  ‘the’  is  acceptable,  however. 


(Note  unat  for  simplicity  of  presentation  no  distinction  among  verb 
classes  has  been  made  lr.  r»  1  rules  01’  Tables  ‘  -  4. ) 
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The  aea 


The  ‘Art^  with  *  rero  aubscript  cannot  be  further  expanded. 
Let  the  •*  '  be  expanded  by  rule  2  of  Table  fc. 
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Let  'N0'  be  chosen  aa  the  next  expansion  of  N^,  by  rule  7.  Nov 
the  only  node  that  remains  to  be  expanded  la  V^.  If  rule  4  of  Table  4 
Is  chosen,  the  part  of  the  tree  pertinent  to  'rides'  becomes: 


A  noun  dependent  on 'rides'  must  now  be  found.  Either  token  of 
'man'  would  be  rejected.  If  'vehicle'  Is  chosen,  a  path  does  exist 
that  traverses  a  transitive  verb  'Is'  and  two  tokens  of  'bicycjc.' 


vnr. 
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Let  'V0‘  be  chosen  aa  the  rewirting  or  V„  by  rule  3  of  Table  4, 
and  let  the  be  rewritten  by  rule  1  of  Table  4.  The  pertinent  part 
of  the  t-je  nov  appears  as  follows: 


,V3 

/  riles 


1 

rides 


'0 

rides 


\ 


n. 

1 

’  vehicle 

\ 


Art, 


\ 


2 

vehicle 


Aasune  that  ’  a'  is  chosen  at  the  amide  and  that  N  is  rewritten 


as  +  Mod^  by  rule  3  of  Table  4.  The  result  is: 


Art0 


The 


/ 


2 

nan 


’1 

rides 

f 


AdJ0 

tall 


•0 

riles 


Nc 

man 


vehicle 
/  V 
/  \ 


/  vehicle 

/  \ 


K1  *«1 

vehicle 
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The  Mod^  is  purely  a  slot  marker,  and  no  vocabulary  item  is 
selected  for  it.  If  the  Mod^  is  rewritten  PrepQ  +  by  rule  5  of 
Table  U,  'with*  would  be  selected  as  a  preposition  clanendent  on  'vehicle,' 
and  '-wheels'  as  a  noun  dependant  oh  'with-'  After  the  application 
of  rule  7,  the  would  be  rewritten  N^,  completing  the  genera.'  Icn. 


with  -wheels 

i 

Or,  'The  tall  nan  rides  a  vehicle  with  wheels . ;  wheels 

In  esses  where  no  word  with  the  r-T-iired  dependencies  .cii  be 
found,  the  prec,-w  in  snr®  instances  deletes  the  pertinent  portion 
of  the  tree,  in  ethers, completely  tir.-t-  the  .rneration  process.  The 
selection  of  both  vocabui.u-y  items  and  strv-'  .  -al  formulas  is  done 
randcnly ■ 


! 

f 
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lt.G  An  Essay  Writing  Systess 

Three  computer  programs  were  ot.i.-ribed  in  sections  2  and  3- 
The  first  performs  a  unique  dependency  and  phrase  structure  analysis 
of  individual  sentences  Jn  written  English  text,  the  .ocabulary  of 
which  ha.:  received  unique  grammar  codes.  The  power  of  this  procram 
±»  1' sited  to  the  capabilities  of  en  extremely  .sa.’ I  reco-rlf  on 
grujmar. 

The  second  program  generates  grammatically  correct  sentences 
without  control  of  meaning.  The  third  program  consists  of  a  version 
of  the  second  program  coupled  with  a  dependency  monitoring  system  that 
requires  the  output  sentences  to  preserve  the  transitive  dependency 
relations  existing  in  a  source  text.  A  unique  dependency  analysis 
cove-'ne  relations  both  within  and  among  text  sentences  is  provided  as 
part  of  the  input,  file  outputs  rf  this  third  program  are  grammatically 
•— rect,  coherent  paraphrases  of  the  input  text  vtaicn,  however,  are 
random  with  respect  to  sequence  and  repetition  of  source  text  content. 

What  is  called  an  'essay'  writing  system  in  this  section  consists 
of  all  the  p  prams  described  earlier,  plus  a  routine  for  assigning 
dependency  re1  at  ions  across  sentences  in  za  input  tor.t  md  a  routine 
which  insures  that  the  paraphrase  ;  -.tences  will  nnre»-  a  icn.-si 
sequence  and  will  net  >-  repet  it  ires*  with  respect  to  the  source  text 
content.  Still  another  device  it  -  cutlr.*.-  that  permits  the  gene-. tier, 
of  a  paraphrase  sreunu  an  outline  supp”  '  vitb  a  lerger  body  of  text. 
In  addition,  several  generative  devices  heve  been  addea:  routines 
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for  using  subject  and  object  pronouns  even  though  none  occur  in  the 
input  text,  routines  for  generating  relative  clauses,  although,  again, 
none  nay  occur  in  the  input  text,  and  a  routine  for  converting  source 
text  verbs  to  output  ti-xt  fores  ending  in  •  -log. : 

4.1  Dependency  Analysis  of  an  Entire  Discourse 

After  the  operation  of  the  routine  that  ner*om.s  a  dependency 
and  phrase  structure  analysis  of  individual  sentences,  **•  is  necessary 
for  another  program  to  enalyie  the  text  as  a  unit  to  assign  depend¬ 
ency  links  across  sentences  end  to  alter  sods  dependency  relations 
for  the  sake  of  coherent  paraphrasing.  The  present  version  of  the 
progran  assigns  tuo-vay  dependency  links  between  like  tokens  of  the 
sane  noun.  A  future  version  will  be  more  restrictive  and  assign  such 
links  only  among  tokens  having  either  similar  quantifiers,  determiners, 
or  subordinate  clauses,  which  are  determined  to  be  equatable  by 
special  semantic  rules.  This  is  iiecesssry  to  Insure  that  each  token 
of  the  sane  noun  has  the  same  real  world  referent. 

While  simple  dependency  relations  are  sufficient  for  paraphrasing 
the  artificial*?  constructed  texts  usea  in  u*e  experiments  described 
in  this  paper,  p amp.'..- using  of  unrestricted  English  text  would  demand 
special  rule  revisions  with  respect  to  the  direction  ‘md  r.irj, 
the  dependency  relation.  The  reason  ror  this  is  easily  understood  by 
a  simple  example  familiar  to  tran«forr.rj,lon-.l  :.sts. 
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'The  cup  of  water  1*  00  the  table1 


■m 

\  X 

water  tabli 


'The  Ktu*  of  Spain  la  in  Franc*’ 


Spain  in 


1  X 

Franc* 

Tha  paraing  ayaten  would  yield  the  type  of  malyti*  for  each 
aantanca.  Yet  It  would  be  desirable  to  be  able  to  paraphraae  the 
flrat  **rt*nc*  v  .  h: 

•The  water  li  on  tha  table’ 

vi*h>*rt  the  poaalbility  of  paraphraain-  the  aacond  I'lVnci  ."rh 


°o*in  ia  in  F.aac*.* 
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Accordingly,  a  future  aodlfic«(tlon  of  the  routine  described  In 
this  section  would,  after  noting  the  special  word  classes  Involved, 
assign  two-way  dependency  links  between  'cun'  aid  ’of  and  also  between 
’of’  and  'water,*  but  take  no  such  action  with  words  'King,'  'of,'  end 
’Spain'  in  the  second  sentence.  This  reparsing  of  a  parsing  N-a  signifi¬ 
cance  for  a  theory  of  grammar,  and  Its  Implications  with  respect  to 
stratifications!  and  transforaational  nod  els  Is  discussed  In  section  5. 
4.2  Paraphrase  Formatting 

Control  over  sequence  end  nonrepetltlon  of  the  paraphrase  sen¬ 
tences  la  obtained  through  the  selection  of  an  essay  format.  The 
format  used  In  the  experiments  performed  consists  of  a  set  of  para¬ 
graphs  each  of  which  contains  only  sentences  with  the  saax  main  subject. 
The  ordering  of  the  paragraphs  is  datermlned  by  the  sequence  of  nouns 
as  tney  occur  in  *''e  source  text.  The  ordering  of  sentences  within 
each  paragraph  Is  partially  controlled  by  we  «<  Tjsecce  of  verbs  as 
tb  ;  occur  In  that  text. 

Before  the  par  ip... -suing  is  begun,  two  void  lists  are  ccmipilsd 
by  a  subroutine.  The  first  list  contains  a  to*  m  of  esc',  source 
text  noun  that  is  not  leper  dec*:  on  any  noun  or  no un  token  occurring 
before  it  In  the  text.  The  tokens  are  ar-wiged  in  souce  text  jr  *.*  r • 

The  second  list  consists  of  every  tc»en  of  every  r.i'i  in  the  text. 
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T1-*  '<«•«*  noun  on  t.;e  list  is  automatically  selected  as  the  main 
subject  noun  for  each  sentence  that  is  to  be  generated.  As  many  gener¬ 
ations  are  attempted  as  there  are  verbs  on  the  vert  list.  The  main 
verb  for  each  such  sentence  generation  attempt  is  taken  in  sequence 
fro«a  those  on  the  list.  Once  a  sentence  is  successfully  gent-  it ed,  the 
token  of  the  verb  used  is  deleted  free  the  verb  list.  Sonsequential 
use  of  verbs  can  occur  in  relative  clauses  or  modifying  phrases,  in 
these  Instances  also,  the  verbs  or  verb  stem  token*  need  are  deleted 
fro-  the  verb  list.  When  every  verb  on  the  list  has  been  tried  as  the 
main  verb  for  a  particular  main  subject  noun,  a  new  paragraph  is  begun 
and  the  next  noun  on  the  list  becomes  the  tain  subject  for  each  sen¬ 
tence.  The  proceaa  is  continued  until  the  norm  list  la  exhausted. 

It  may  happen  that  sone  nouns  do  not  appear  as  subjects  of  paragraphs 
even  thc_£-.  l.ey  -pp'dr  on  the  noun  list,  because  they  do  not  occur 
aa  main  aubjecta  in  the  source  text  (Thi»  pro.ed’i’-e  was  arbitrarily 
selected  os  suitable  for  testing  the  program;  other  formats  for  essay 
generation  can  be  inpieamted. ) 

The  use  of  an  outline  as  the  basis  fer  s-  lerst^ig  _o  essay  fraa 
a  larger  body  of  text  is  C"-ci.plJ«hed  simply;  t.he  boundary  between 
the  cutline  m-A  the  muin  body  of  text  •h  "  followa  is  marked.  T  - 
nour.  list  is  limited  only  to  those  nouns  occui~ir,.  i,.  the  outline. 

The  >  erbs  selected  still  Include  the  *  the  tain  text  as  well  as 
the  ones  in  the  outline.  Theoretic  rdly,  the  main  text  could  consist 
of  s  large  library;  in  that  esse  the  oucll.ie  might  be  -ieved  as  an 
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Information  retrieval  request.  The  output  would  be  an  essay  limited 
to  the  subject  matter  of  the  outline  but  drawn  froii  a  corpus  indefinitely 
large  in  both  size  and  range  of  subject  matter. 

4-3  Generation  of  Word  Forms  Not  Present  in  the  Source  Text 

Earlier  experiments  indicated  that  in  man;'  inst  jr.  ;os  reasonable 
paraphrases  could  be  performed  with  the  method  dce-rlbad  herein  if 
the  dependency  relations  held  only  snong  stems  rather  than  astong  fill! 
word  forms  and  if  the  stesis  were  subsequently  converted  .  of 

the  proper  grsaoatical  category.  The  present  system  will  accept  a 
verb  form  with  proper  dependency  relatione  and  use  it  lr.  a  form  ending 
in  ' -lng'  when  appropriate. 

Relative  clauses  may  be  generated  even  though  no  relative  pro¬ 
nouns  occur  in  the  source  text.  Where  the  generation  process  requires 
a  valatx<e  rrroiysni,  'ifco'  or  'which'  is  inserted  into  the  proper  slot 
depending  cn  lit  gr-ner  the  armrop.  la*  *  antecedent.  All  the  de- 
t  raptors  of  the  antecedent  ere  then  assigned  t->  the  relative  pronoun. 

Ae  far  aa  the  operot'or.  oi  all  programs  is  concerned,  the  pronoun  is 
its  antecedent.  Accordingly,  if  a  routine  is  to  inquire  whether  a 
particular  vero  is  dependent  on  a  relative  pronoun,  the  request  is 
fcTiaulated  in  Wras  "f  the  verb's  dependency  on  the  ’it.-' '  az  .vlcrt  of 
the  relative  pronoun. 

The  system  may  also  generate  subject  or.d  object  pronouns  sltliough 
sv  v.  forms  do  not  occur  in  the  source  .r-cj.  Xhe  use  of  subject  or.d 
object  pronouns  is  accomplished  by  separ  i  i  routines,  lubjcct  pronouns 
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n ay  be  used  randoely  at  a  frequency  that  nay  be  controlled  by  Input 
parameters.  After  the  occurrence  of  the  first  sentence  in  a  paragraph, 
a  subject  pronoun  of  sppropris  .e  gender  and  number  nay  be  used  as  the 
stain  subject  of  subseauent.  sentences  with4*,  *b->  paragraph  if  program 
generated  ’•ir.doti  numbers  fall  wrthir.  a  specified  range. 

The  occurrence  of  an  object  pronoun  of  appropriate  number  and 
gender  is  obligatory  whenever  a  r.or.subjcct  noun  would  normally  oe 
identical  with  the  last  nonaain  subject  noun  used.  A  special  storage 
unit  containing  the  last  nonmain  subject  noun  used  Rives  the  program 
easy  recognition  of  the  need  for  a  pronoun. 

4.1*  Computer  Generated  Essays 

A  number  of  essays  were  produced  from  varied  texts,  all  of  which 
were  specially  constructed  so  as  to  be  suitable  for  parsing  by  a 
smai*.  dependency  structure  grammar.  The  passing  recog¬ 

nition  gr«ashr  is  contained  in  Te-'jji:  7.  idec«s»oe  the  material  covered 
forms  a  related  whole,  Table  5  and  all  subsequent  tables  arc  gathered 
in  an  appendix  at  the  end  of  this  document.)  The  generation  grammar  is 
shown  in  T*M"  6.  The  recognition  grammar  4  .  rcr»  pot-riol  than  the 
generation  grammar .  "he  first  Input  ttys  r.o  use  cf  ar.  outline; 
more  exactly,  because  the  program  or.tic.j.atos  the  presence  cf 
outline,  the  entire  text  was  its  own  eutllre.  Input  Text  1  is  contained 
in  Tabic  7,  part  1.  Its  essay  paraph-  "r,  C.  tput  Text  I,  is  contained 
Iti  Table  7,  part  2.  Hr  s  that  the-  generation  rules  used  in  producing 
Output  Text  I  do  not  contain  the  rule  for  producing  fo.-ts  ending  in  ’-lng.’ 
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The  use  of  this  rule  and  the  associated  device  for  converting  verb 
forms  ending  in  *-ing'  is  illustrated  in  Output  Texts  III  and  IV,  which 
appear  in  Tables  10  and  11. 

Unambiguous  word  class  assignments  were  part  cf  the  input  data. 

As  an  example,  the  first  sentence  of  Input  Text  I,  Table  7,  v.-.  coded: 

Clever  ( adj. )  John  (noun,  masc.,  sg.)  met  (verb,  3rd  per ft.  sg.) 

Mary  (noun,  fern.,  sg.)  in  (prep.)  the  (art)  perk  -n.  peat.  ag.). 

Coital  letters  were  indicated  by  a  '+'  sign  preceding  the  first 
letter  or  word  because  a  computer  does  not  normally  recognize  such 
forms.  The  presence  of  an  initial  capital  letter  with  a  word  ccced 
'noun'  provided  the  program  with  Information  sufficient  to  distinguish 
such  forms  ss  belonging  to  a  separate  class.  Two  verb  classes  were 
distinguished  in  the  recoai-ttion  grammar,  forms  of  'to  be'  and  all 
others;  also,  two  preposition  claa.es  were  established,  'of  and  all 
others.  Ad  hoc  word  class  assignments  were  made  in  the  case  of 
'married'  in  Input  Text  I,  Trble  I,  which  was  treated  as  a  noun,  and 
the  esse  of  'fl  -enco'  in  Input  Text  II,  Table  9,  whirl;  was  labeled 
an  adjective.  In  e.vch  case  this  was  done  in  order  te  avoid  a  more 
complicated  generation  grammar.  A  price  was  paid  for  this  slmpl-.ricat1.or 
as  can  be  seen  in  the  phrase  'yiamer.ee  ‘!eler.’  generated  in  Output  Text  II, 
Table  9.  The  uncapitalized  form  of  'benvley'  which  appears  in  several 
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of  the  later  paraphrases  Is  not  a  typographical  error,  but  rather  is 
intended  to  reflect  the  use  of  capitalization  to  distinguish  a  separate 
word  clas3.  In  order  not  to  assign  'bentley*  to  the  sane  class  as 
'John’  it  was  left  ur.caDitulized.  (the  device  Is  not  wholly  adequate.) 
The  noun  class*!  differentiated  by  the  presence  or  absence  c:'  prefixeu 
•+'  were  nanipulated  directly  vithin  the  program  rather  than  by  special 
rules  for  each  class.  The  progrrffl  prevented  a  fora  orefined  by  a  '+' 
from  taking  an  article  and  free  being  followed  by  a  forn  ending  in  ' -lng. ' 

It  should  be  noted  that  the  spacing  of  the  output  texts  in  Table  7 
and  beyond  is  edited  with  respect  to  spacing  within  paragraphs.  Only 
the  spacing  between  paragraphs  is  sinilar  to  that  of  the  origJ-.il  output. 

Table  8  contains  an  essay  paraphrase  generated  with  the  require¬ 
ment  that  only  the  converse  of  Input  Text  I  dependencies  be  present 
in  the  output. 

5.0  Discussion 

There  are  several  consents  that  can  be  nade  shout  the  essay 
writing  prograa  with  respect  both  to  the  functioning  of  the  prograna 
and  to  the  implications  for  linguistic  theci-_  r^Sgestuu  by  the  results. 
5.1  ProgrBs 

The  compiled  prograa  occupies  vuiu.  12,000  registers  oi 
Fhilco  2000  core  stor^e,  ayprex irately  registers  of  which  are 

devoted  to  tahlec.  The  JOVIAL  progra.  contains  approx Lnatiiy  750 
acatesonts.  Because  of  space  limitations,  ‘be  largest  text  the  system 
can  paraphrase  is  300  English  words,  counting  periods  as  wrds. 
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One  early  version  of  the  systea  took  an  hour  and  a  half  to  para¬ 
phrase  150  words  of  text;  various  attests  were  nade  to  control  this 
processing  time.  Two  programing  devices  used  in  this  effort  are 
described  below. 

Beccssis  the  generation  process  involves  a  search  of  a  network— 
the  dependency  structure  of  the  text- -the  proccs*i.-.g  tine  would  be 
expected  to  increase  exponentially  with  text  size,  i'r.e  -  factors 
that  control  the  exponential  rate  of  growth,  besides  text  length,  are 
the  anount  of  connectivity  among  words  and  the  syntactic  complexity 
required  of  the  sentences  generated.  Text  that  seldom  repeats  tokens 
of  nouns  would  yield  a  nearly  linear  network,  and  the  exponential  in¬ 
crease  of  processing  tine  per  word  with  respect  to  length  would  wt  ■* 
noticed1*  for  short  texts.  However,  the  texts  paraphrased  in  this 
paper  had  a  fairly  hign  frequency  c.f  repetition  of  noun  tokens.  The 
ne'  -rk  representing  the  dependencies  was  nade  relatively  linear  by 
having  the  progran  link  a  noun  token  only  to  its  irnediately  preceding 
token.  Because  dependency  is  transitive,  all  computed  results  were 
the  same  os  if  ich  token  of  a  noun  were  linked  to  every  other  token 
of  the  aaoe  ncrur .  Because  or  th:.s  linking  convention,  the  dependency 
network  was  sufficiently  linear  to  \;e  ..  the  rate  of  5"?re».e  i-tcf 
with  respect  to  text  le*V,'.i,  at  least  lor  the  examples  used  in  thi-j 


pap--r 


July  21,  196U 


-45- 


SP-1602/0^1/00 


Another  device  contributing  to  the  reduction  of  processing  tine 
is  tree  pruning.  The  program  generates  a  tree.  If  a  subconstructior. 
is  initiated  that  cannot  be  carried  to  couplet  ion,  ' t  is  often 
deleted  without  abandonment  of  the  rcT..ui..lu.  of  the  g— c-at.un  tree. 
Unrealizable  adjectives  are  among  the  units  pruned.  The  ada.  tion 
of  a  routine  to  prune  oodifying  phrases  reduced  the  processing  '.ine 
to  approximately  10^  of  the  tine  required  without  **»•  -'-'i.lne  when 
the  systen  was  set  to  favor  text  with  numerous  oodifying  phrases. 

The  cveroge  tine  for  generating  on  essay  free  an  input  of  about 
150  words  is  now  7  to  15  ninutes, depending  on  the  syntactic  complexity 
required  of  the  output.  The  processing  tine  for  producing  a  t.uct 
from  a  50-word  source  is  about  15  ninutes.  From  these  figures  it 
can  be  seen  that  the  processing  tine  per  word  increases  linearly  with 
the  lengtn  of  th"  t^xt- • le  seconds  per  word  for  a  50-word  text  input, 
about  1*5  seconds  a  word  for  a  1>C  -wjrd  text  input. 

5.2  Theoretical  Inpllcationa 

The  present  version  of  the  automatic  essay  writing  systen  could 
r.ot  operate  satisfactorily  vith  unrestricted  injiisr  text  aa  input. 

For  it  to  do  co  won1'*  require  refliist-eni  of  the  depeedenev  analytic, 
which  was  derived  fron  inardlale  constituency  considcrat.’  -!,■>.  ,s 
indicated  earlier,  reassignment  of  dependency  links  or.  the  basis  of 
the  presence  of  numerous  special  void  classes  would  be  necessary. 

The  problem  presented  b-  the  necessity  for  -cognizing  multiple 
parsings  of  English  sentences  remains  as  another  major  hurdle- 
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The  fact  that  verbs  having  appropriate  dependency  relations 
In  source  texts  were  satisfactorily  used  as  '-lng1  forms  in  paraphrases 
suggests  a  more  general  system  in  vhlch  input  text,  words  belonging  to 
a  variety  of  grammatical  classes  cuuld  be  converted  t<.  new  forms  In 
output  text  by  the  appropriate  application  of  vhat  might  be  described 
as  inflectional  and  derivational  processes. 

Such  a  system  would  have  significance  for  lircpiixtic  theory. 

Even  the  system  described  earlier  accomplishes  the  work  of  a  number  of 
transformations  without  using  any.  While  a  transformational  grasnar 
might  be  used  to  produce  paraphrases  beyond  the  scope  of  this  system, 
the  work  of  many  transformations  can  be  accomplished  with  a  simpler 
conceptual  framewsrk.  A  transformation  is  an  empirical  generalization 
about  a  relrt  ior.  bin  between  strings.  In  contrast,  a  transitive  Inter¬ 
pretation  of  dependency  i  elation-  can  otter,  be  used  to  predict  the 
relationship  a  transformation  represents. 

Even  though  the  essay  writer  described  in  this  paper  is  an  applied 
system,  any  complete  theory  of  grammar  should  be  able  4  ">  account  for 
its  operation.  I  do  not  bel'eve  that  transformational  theory  con  do 
so- 

A  stratifications!  model  of  language  might  t-svr  more  explanatory 

8  <3 

power.  If,  as  in  Sydney  Lamb's  mo '  of.':  posits  the  existence  of 

a  ..etcemlc  stratum  above  a  le..**«f  c  one,  an  explanation  can  be  provided. 
Dependency  relations  may  be  viewed  as  a  .scemlc  counterpart  of  tactic 
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relatlons  among  sememes.  A  dependency  structure  defining  relation* 
among  lexemlc  units  vould  have  many  very  similar  counterparts  on  the 
senemic  stratum,  soraevhat  as  a  Hating  of  alloaorphs  In  a  language 
might  resemble  a  listing  cf  morphemes.  The  experiment;  described, 
operated  u-'uer  conditions  where  the  dependency  structure  was  a  close 
approximation  to  the  seaotactic  structure  lAicl.  posited  os  Icing 
the  proper  domain  for  manipulating  meaning  relations  be..  ■.  one  text 
and  another.  The  first  dependency  asalysls  is  tnalagous  to  lexotactlc 
analysis.  A  refinement  of  this  analysis  night  correspond  to  a  semotactlc 
analysis.  Conceivably,  a  sufficiently  refined  system  night  come  to 
reaemble  a  dynamic  Implementation  of  a  stratifications!  model. 
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1  Ai%  +  -  t2 

2.  WJq  ”  ®i_ 

3.  «2  ♦  a»co«u  - 

k.  ng  +  mo^  -  nu 

5.  Ro-*! 

6.  vx  ♦  -  »2 

7.  v0.vx 

8.  Partg  ♦  *3  ”  Mo4j_ 

9.  PTep.  ♦  »  Mod^ 

10.  \*Vk-\ 

11.  I  rUc:^  *  V2  .  SbCn1 

Table  6 
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Clever  John  net  n>  y  ii>  the  perk.  John  tarried  Mary, 
krj  loved  John.  Mery  vented  a  child.  Mery  had  & 
child.  Mery  reieed  e  child.  Cohn  vee  a  wxueeaful 
bu&lnesaaan  vho  worked  for  e  corporation.  Mary  v«. 
pemilece.  John  eacretly  loved  Helen  *0  was  beautiful. 
Helen  vbo  also  loved  John  vee  aarrled  to  Peter.  Mary 
vee  a  friend  of  Helen.  Peter  vaa  a  buddy  of  John. 

Helen  vho  vaa  friendly  often  ate  lunch  vith  Mery.  John 
played  golf  vith  Peter.  John  vented  Helen.  Helen  vented 
C:v*'.  Divorce  vaa  lapoeslble.  The  eolutlon  vee  slapla. 
John  lived  Mary.  Helen  lliurl  ***er.  John  killed  Peter. 
Helen  killed  Mary.  The  and  vee  happy. 


Table  7,  Part  1 
Input  Text  I 
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John  vho  Berried  pcnnlleee  Mary  set  her.  Clever  John 
vee  a  buelneeaaen.  He  lered  friendly  Helen.  He  played 
gulr.  He  vented  Hales.  John  idio  killed  r  buddy  liked 
pesnlleaa  Mary. 

Mary  In  the  park  vho  vented  a  child  loved  clever  John. 
9m  had  a  child.  9m  raised  it.  9m  vae  a  friend  of 
friendly  beautiful  Helen. 

Beautiful  Helen  loved  eucceeaful  John.  Beautiful  Helen 
vaia  aarrled.  Helen  vbo  vented  John  ate  lunch.  9m 
i..'»ed  e  buddy.  9m  killed  Mery. 

Peter  vaa  a  buddy. 


Table  7,  Pert  2 
Output  Text  I 
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Jofan  loved  Mary.  John  loved  Helen,  oa  vented 
her. 

Mery  Ao  eerrlad  John  net  his.  Mery  vbo  killed 
Helen  liked  John. 

Child  netted  Mery.  It  had  her.  It  reined  her. 
Helen  loved  John.  9ia  vented  hie. 

Peter  vte  killed  him  liked  Helen. 

Loach  ate  her. 

Oalf  jr'  *,-£  John  at  Peter. 


Table  8 

Perephreee  of  Input  Text  T  UeLrv  Ccnreree  of  D*rcjdeacie« 


July  21,  19&* 


SP-I602/OOI/OC 


(Outline) 


(Main  Text) 
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Clerer  John  raetMary  in  the  paik.  John  carried  Mary. 

Mery  loved  .T«*n.  Mary  wanted  a  dull,  i!a~t  had  • 
child.  Mery  raised  a  child.  John  vas  a  euccessfi. 
businessman  idio  worked  for  a  corporation-  Mary  vaa 
penniless.  John  secretly  loved  Helen  uhc  >ie«urtiful. 
Helen  *0  also  loved  John  vaa  tarried  to  Peter.  Mary 
was  a  friend  of  Helen.  Peter  ves  a  buddy  of  John. 

Helen  vho  vaa  friendly  often  ate  lunch  with  Mary.  John 
played  golf  vith  Peter.  John  wanted  Helen.  Helen  .noted 
John.  Divorce  vas  inpossible.  The  solution  vas  staple, 
John  liked  Mary.  Helen  liked  Peter.  John  killed  Peter, 
hit  '•'.'i  "-.•y.  The  end  vas  happy. 

A  buslnaasaan  la  a  nan  *0  likes  oonay.  John  vas  a 
gangater.  Peter  vas  a  bullfighter.  Mary  vas  s  countess. 
Helen  was  a  flaernco  dancer,  tar  ■  Is  »  r. Utley  Deal. 

A  gangs*.*'  coe. u ta  erteer.  A  bul1  tighter  fights  bull.;. 
Hulls  are  dangeroui  entails.  T.s  gangster  drives  r 
bant  ley.  The  floaenco  dancer  bev  . eery  auairers.  The 
eoimtesa  owns  a  castle. 


Table  o,  Part  l 


July  ZL,  196V 


sp-ifioa/ooi/co 
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Jchn  vfao  —1  1 1.cd  penniless  Mery  net  her.  Clever  John 
tA»  coaalts  crises  wu  e  businifrwn.  .levtr  John  vhc 
drives  a  heat  ley  laved  s  flamenco  dancer.  John  played 
golf*  He  wmrted  Helen.  Clever  John  vhe  killed  Peter 
liked  Nny.  John  dto  likes  money  Is  s  sen.  Clever 
John  ess  a  gngiter. 

Mary  loved  s  successful  businessman.  Mery  vbo  was  a 
coontess  wan+sd  a  child.  Penniless  Mary  had  It. 
Penniless  Mary  raised  It.  9m  was  a  friend.  Mary  In 
the  park  owns  a  castle. 

FlaamfeO  l.led  Irrt  clever  Jchn.  She  was  married. 

She  ate  lunch  with  Mary.  Ftlas  waned  jJx-  3m  liked 
Pater.  Helen  killed  e  countess.  Helen  1*0  has  neny 
admirers  wee  s  dancer. 

Pater  ats  fights  hulls  was  e  buddy  of  John.  Be  wae 
e  bullfighter. 


Table  9,  Part  2 
Output  ’xi..  r  i 


I 


July  21,  1964 
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(Outline) 


(Main  text) 


I 


The  hero  la  Feter.  The  unfaithful  husband  la 
John  vfeo  coenlti  murder. 


John  was  a  gangster.  The  gangster  drive*  a 
ben t ley •  A  gangster  coeaalt*  crime*.  John 
was  a  successful  businesses*!  who  works  f'T  a 
corporation.  Bulla  are  dangerous  anioals. 
Peter  vs*  a  bullfighter.  A  bullfighter  fights 

Tsble  10,  Tart  1 
Input  Text  HI 


! 
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A  hero  fighting  bulla  la  Peter.  He  vaa  a 
bullfighter. 

The  huaband  conaalttlng  Border  la  aucceaaful 
John  vho  vma  a  gangater  driving  a  bentley. 

A  huaband  coaaalta  criiaee.  The  aucceaaful 
UlTniti  *uJ  huaband  la  a  aucceaaful  bualneeaman- 

Table  10,  Fart  2 
Output  Text  III 

With  Court  alon  of  Source  Text  Verta  to  ron»  In  *  -lng' 


uly  21,  19&> 


sp-i6o2/ai,uo 


(Outline)  The  hero  it  Peter.  The  homevrecktr  Is  Helm.  The 
unfaithful  hn«v.~d  is  John  tito  cuobuLb  curder  The 
housevife  Is  Mary. 

(Main  text)  John  Is  a  successful  businessman  vfco  works  for  a 

corporation.  A  businessman  Is  a  man  who  likes  money. 

John  was  a  gang'ter.  Peter  was  a  bullfighter.  Mary 
was  a  countess.  Helen  was  a  dancer.  A  gangster  crass  its 
crimes.  A  bullfighter'  fights  bulls.  Bulls  are  dsngsr-rus 
aclasis.  The  gangster  dr  ires  a  bentley.  The  dancer  has 
many  admirers.  The  dancer  wars  a  hat.  The  countess 
owns  a  T~hn  secretly  loved  Helen  who  waa 

beautiful.  Helen  who  *l~o  lowed  John  weu  married  to 
Peter.  John  wanted  Helen.  Helen  wanted  John. 

Divorce  vjs  Impossible.  The  solution  was  simple.  John 
kl1le'*  Peter.  Helen  killed  Mary.  end  w a*  happy. 

TabL.  11,  x.  1 
Input  Text  Tf 
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g  hero  fighting  hulls  Is  Peter.  He  was  a 
bullfighter « 

The  beautiful  homevrecker  1*0  wanted  a  gangs  ter  ’iio 
eoeasita  crimes  la  Helen.  The  home  wrecker  was  a  '  *•> 

has  many  admirers.  She  wars  a  hat.  ae  loved  successful 
John  who  loved  the  dancer.  A  beautiful  hcnewrecker  was 
married.  She  killed  Mary  *0  owns  a  castle. 

An  unfaithful  husband  liking  money  is  the  gangster 
'driving- a behtley.  He  conmits  murder.  -The  unfaithful 
h*»»t>cnd  vor‘,  lag  la  a  successful  businessman.  He  is 
a  man.  The  husband  was  .  gangster.  "he  .^faithful  husband 
wanted  Helen.  The  husband  killed  Peter. 

Table  11,  Part  2 
Oir-jut  Text  IV  -  • 

With  Conversion  of  Verb*  to  ion»e  Ending  in  ’  --JVJ  ' 


UBCLASSCT  [S' 


Systap  Deireloppent  Corporation, 

Sant*  Monica,  California 

JUJTOtATIC  PARAPHRASIS?  IB  ESSAT  ?<*MAT. 

Scientific  rept.,  3? -1602/001/00,  \y 
S.  Klein.  21  July  1964,  60p.,  11  tall** 

Uncle**; fled  report 

DESCRIPTORS:  Lan£ue«e. 

Describee  an  operating  computer  pwgri 
that  eccepta  aa  Input  an  eeaay  of  up  to 
300  words  in  length,  and  yields  as  output 
an  assay-type  paraphrase  that  is  a 
nonred  undent  euaBary  of  the  content  of 

majtssxnm 


UBCLA88IPI  2D 

the  source  Uxt.  Benert*  thet 
eltho^n  no  tr*nsforp«-tlc-»  ••>■*• 
used,  the  content  or  several  sentences 
In  the  input  text  say  he  ccafalned  into 
a  eentenc  m  the  output,  further 
reports  thet  the  foraat  of  the  output 
essay  ney  be  varied  by  adt  i“Tent  of 
pro^raa  perwetere,  and  that  the  syrtsa 
occasionally  4nserta  subject  or  object 
pronouns  In  Its  pernph"  see  to  avolu 
r*i-  tltlous  ntyle. 


